Search CORE

195 research outputs found

CMInject:Python framework for the numerical simulation of nanoparticle injection pipelines

Author: Amin Muhamed
Küpper Jochen
Welker Simon
Publication venue: 'Elsevier BV'
Publication date: 11/02/2021
Field of study

CMInject simulates nanoparticle injection experiments of particles with diameters in the micrometer to nanometer-regime, e.g., for single-particle-imaging experiments. Particle-particle interactions and particle-induced changes in the surrounding fields are disregarded, due to low nanoparticle concentration in these experiments. CMInject's focus lies on the correct modeling of different forces on such particles, such as fluid-dynamics or light-induced interactions, to allow for simulations that further the scientific development of nanoparticle injection pipelines. To provide a usable basis for this framework and allow for a variety of experiments to be simulated, we implemented first specific force models: fluid drag forces, Brownian motion, and photophoretic forces. For verification, we benchmarked a drag-force-based simulation against a nanoparticle focusing experiment. We envision its use and further development by experimentalists, theorists, and software developers. Program summary: Program Title: CMInject CPC Library link to program files: https://doi.org/10.17632/rbpgn4fk3z.1 Developer's repository link: https://github.com/cfel-cmi/cminject Code Ocean capsule: https://codeocean.com/capsule/5146104 Licensing provisions: GPLv3 Programming language: Python 3 Supplementary material: Code to reproduce and analyze simulation results, example input and output data, video files of trajectory movies Nature of problem: Well-defined, reproducible, and interchangeable simulation setups of experimental injection pipelines for biological and artificial nanoparticles, in particular such pipelines that aim to advance the field of single-particle imaging. Solution method: The definition and implementation of an extensible Python 3 framework to model and execute such simulation setups based on object-oriented software design, making use of parallelization facilities and modern numerical integration routines. Additional comments including restrictions and unusual features: Supplementary executable scripts for quantitative and visual analyses of result data are also part of the framework

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

DESY

Dissertations of the University of Groningen

DiffPhase: Generative Diffusion-based STFT Phase Retrieval

Author: Gerkmann Timo
Peer Tal
Welker Simon
Publication venue
Publication date: 08/11/2022
Field of study

Diffusion probabilistic models have been recently used in a variety of tasks, including speech enhancement and synthesis. As a generative approach, diffusion models have been shown to be especially suitable for imputation problems, where missing data is generated based on existing data. Phase retrieval is inherently an imputation problem, where phase information has to be generated based on the given magnitude. In this work we build upon previous work in the speech domain, adapting a speech enhancement diffusion model specifically for STFT phase retrieval. Evaluation using speech quality and intelligibility metrics shows the diffusion approach is well-suited to the phase retrieval task, with performance surpassing both classical and modern methods.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

DriftRec: Adapting diffusion models to blind JPEG restoration

Author: Chapman Henry N.
Gerkmann Timo
Welker Simon
Publication venue
Publication date: 03/07/2023
Field of study

In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an

L_2

regression baseline with the same network architecture and two state-of-the-art techniques for JPEG restoration, we show that our approach can escape the tendency of other methods to generate blurry images, and recovers the distribution of clean images significantly more faithfully. For this, only a dataset of clean/corrupted image pairs and no knowledge about the corruption operation is required, enabling wider applicability to other restoration tasks. In contrast to other conditional and unconditional diffusion models, we utilize the idea that the distributions of clean and corrupted images are much closer to each other than each is to the usual Gaussian prior of the reverse process in diffusion models. Our approach therefore requires only low levels of added noise, and needs comparatively few sampling steps even without further optimizations. We show that DriftRec naturally generalizes to realistic and difficult scenarios such as unaligned double JPEG compression and blind restoration of JPEGs found online, without having encountered such examples during training.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Author: Gerkmann Timo
Lay Bunlong
Richter Julius
Welker Simon
Publication venue
Publication date: 30/05/2023
Field of study

Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and thus only at an approximation of the noisy mixture. This results in a discrepancy between the terminating distribution of the forward process and the prior used for solving the reverse process at inference. In this paper, we address this discrepancy and propose a forward process based on a Brownian bridge. We show that such a process leads to a reduction of the mismatch compared to previous diffusion processes. More importantly, we show that our approach improves in objective metrics over the baseline process with only half of the iteration steps and having one hyperparameter less to tune.Comment: 5 pages, 2 figures, Accepted to Interspeech 2022

arXiv.org e-Print Archive

A Flexible Online Framework for Projection-Based STFT Phase Retrieval

Author: Gerkmann Timo
Kolhoff Johannes
Peer Tal
Welker Simon
Publication venue
Publication date: 13/09/2023
Field of study

Several recent contributions in the field of iterative STFT phase retrieval have demonstrated that the performance of the classical Griffin-Lim method can be considerably improved upon. By using the same projection operators as Griffin-Lim, but combining them in innovative ways, these approaches achieve better results in terms of both reconstruction quality and required number of iterations, while retaining a similar computational complexity per iteration. However, like Griffin-Lim, these algorithms operate in an offline manner and thus require an entire spectrogram as input, which is an unrealistic requirement for many real-world speech communication applications. We propose to extend RTISI -- an existing online (frame-by-frame) variant of the Griffin-Lim algorithm -- into a flexible framework that enables straightforward online implementation of any algorithm based on iterative projections. We further employ this framework to implement online variants of the fast Griffin-Lim algorithm, the accelerated Griffin-Lim algorithm, and two algorithms from the optics domain. Evaluation results on speech signals show that, similarly to the offline case, these algorithms can achieve a considerable performance gain compared to RTISI.Comment: Submitted to ICASSP 2

arXiv.org e-Print Archive

Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

Author: Gerkmann Timo
Lemercier Jean-Marie
Richter Julius
Welker Simon
Publication venue
Publication date: 04/11/2022
Field of study

Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech enhancement and dereverberation. While discriminative models have traditionally been argued to be more powerful e.g. for speech enhancement, generative diffusion approaches have recently been shown to narrow this performance gap considerably. In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks. For this, we extend our prior contributions on diffusion-based speech enhancement in the complex time-frequency domain to the task of bandwith extension. We then compare it to a discriminatively trained neural network with the same network architecture on three restoration tasks, namely speech denoising, dereverberation and bandwidth extension. We observe that the generative approach performs globally better than its discriminative counterpart on all tasks, with the strongest benefit for non-additive distortion models, like in dereverberation and bandwidth extension. Code and audio examples can be found online at https://uhh.de/inf-sp-sgmsemultitaskComment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

Author: Gerkmann Timo
Lay Bunlong
Lemercier Jean-Marie
Richter Julius
Welker Simon
Publication venue
Publication date: 13/06/2023
Field of study

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve into an extensive theoretical examination of its implications. Opposed to usual conditional generation tasks, we do not start the reverse process from pure Gaussian noise but from a mixture of noisy speech and Gaussian noise. This matches our forward process which moves from clean speech to noisy speech by including a drift term. We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates. By adapting the network architecture, we are able to significantly improve the speech enhancement performance, indicating that the network, rather than the formalism, was the main limitation of our original approach. In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models and achieves better generalization when evaluating on a different corpus than used for training. We complement the results with an instrumental evaluation using real-world noisy recordings and a listening experiment, in which our proposed method is rated best. Examining different sampler configurations for solving the reverse process allows us to balance the performance and computational speed of the proposed method. Moreover, we show that the proposed method is also suitable for dereverberation and thus not limited to additive background noise removal. Code and audio examples are available online, see https://github.com/sp-uhh/sgmseComment: Accepted versio

arXiv.org e-Print Archive

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Author: Gerkmann Timo
Lay Bunlong
Lehmann-Willenbrock Nale
Prabhu Navin Raj
Welker Simon
Publication venue
Publication date: 14/09/2023
Field of study

Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on acted-out datasets and parallel data samples, in this work we specifically focus on more challenging in-the-wild scenarios and do not rely on parallel data. To this end, we propose a diffusion-based generative model for speech emotion conversion, the EmoConv-Diff, that is trained to reconstruct an input utterance while also conditioning on its emotion. Subsequently, at inference, a target emotion embedding is employed to convert the emotion of the input utterance to the given target emotion. As opposed to performing emotion conversion on categorical representations, we use a continuous arousal dimension to represent emotions while also achieving intensity control. We validate the proposed methodology on a large in-the-wild dataset, the MSP-Podcast v1.10. Our results show that the proposed diffusion model is indeed capable of synthesizing speech with a controllable target emotion. Crucially, the proposed approach shows improved performance along the extreme values of arousal and thereby addresses a common challenge in the speech emotion conversion literature.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

Author: Gerkmann Timo
Richter Julius
Welker Simon
Publication venue
Publication date: 31/03/2022
Field of study

Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We derive this training task within the formalism of stochastic differential equations, thereby enabling the use of predictor-corrector samplers. We provide alternative formulations inspired by previous publications on using SGMs for speech enhancement, avoiding the need for any prior assumptions on the noise distribution and making the training task purely generative which, as we show, results in improved enhancement performance.Comment: Submitted to INTERSPEECH 202

arXiv.org e-Print Archive

BEAR reveals that increased fidelity variants can successfully reduce the mismatch tolerance of adenine but not cytosine base editors

Author: Krausz Sarah
Kulcsár Péter István
Simon Dorottya Anna
Tálas András
Varga Éva
Welker Ervin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Base editors allow for precision engineering of the genome. Here, the authors present BEAR, a plasmid-based fluorescence assay for the measurement of CBE and ABE activity, to reveal the mechanism underlying their differences and to increase the yield of edited cells with reduced indel background

Directory of Open Access Journals

Repository of the Academy's Library